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ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING 

AND THE BOOTSTRAP 

By Joseph P. Romano and Azeem M. Shaikh 

Stanford University and University of Chicago 

This paper provides conditions under which subsampling and the 
bootstrap can be used to construct estimators of the quantiles of the 
distribution of a root that behave well uniformly over a large class 
of distributions P. These results are then applied (i) to construct 
confidence regions that behave well uniformly over P in the sense that 
the coverage probability tends to at least the nominal level uniformly 
over P and (ii) to construct tests that behave well uniformly over P 
in the sense that the size tends to no greater than the nominal level 
uniformly over P. Without these stronger notions of convergence, the 
asymptotic approximations to the coverage probability or size may 
be poor, even in very large samples. Specific applications include the 
multivariate mean, testing moment inequalities, multiple testing, the 
empirical process and [/-statistics. 

1. Introduction. Let = (X x , . . . , X n ) be an i.i.d. sequence of random 
variables with distribution P £ P, and denote by J n (x,P) the distribution 
of a real- valued root R n = R n {X^ n \P) under P. In statistics and econo- 
metrics, it is often of interest to estimate certain quantiles of J n (x,P). Two 
commonly used methods for this purpose are subsampling and the boot- 
strap. This paper provides conditions under which these estimators behave 
well uniformly over P. More precisely, we provide conditions under which 
subsampling and the bootstrap may be used to construct estimators c n (ai) 
of the a\ quantiles of J n (x,P) and c n (l — 02) of the 1 — Q2 quantiles of 
J n (x,P), satisfying 

(1) liminf inf P{c n (cei) <R n < c n (l - a 2 )} > 1 — cx\ — a.%. 

n— ¥00 PgP 

Here, c n (0) is understood to be —00, and c n (l) is understood to be +00. For 
the construction of two-sided confidence intervals of nominal level 1 — 2a for 
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a real- valued parameter, we typically would consider a\ = 02 = a, while for a 
one-sided confidence interval of nominal level 1 — a we would consider either 
ol\ = and 02 = a, or «i = « and 02 = 0. In many cases, it is possible to 
replace the liminf n _ i , 00 and > in (1) with limn^oo and =, respectively. These 
results differ from those usually stated in the literature in that they require 
the convergence to hold uniformly over P instead of just pointwise over P. 
The importance of this stronger notion of convergence when applying these 
results is discussed further below. 

As we will see, the result (1) may hold with a,\ = and «2 = a G (0, 1), 
but it may fail if 02 = and a\ = a G (0, 1), or the other way round. This 
phenomenon arises when it is not possible to estimate J n (x,P) uniformly 
well with respect to a suitable metric, but, in a sense to be made precise by 
our results, it is possible to estimate it sufficiently well to ensure that (1) 
still holds for certain choices of a± and 02- Note that metrics compatible 
with the weak topology are not sufficient for our purposes. In particular, 
closeness of distributions with respect to such a metric does not ensure 
closeness of quantiles. See Remark 2.7 for further discussion of this point. In 
fact, closeness of distributions with respect to even stronger metrics, such 
as the Kolmogorov metric, does not ensure closeness of quantiles either. For 
this reason, our results rely heavily on Lemma A.l which relates closeness 
of distributions with respect to a suitable metric and coverage statements. 

In contrast, the usual arguments for the pointwise asymptotic validity 
of subsampling and the bootstrap rely on showing for each P G P that 
c n (l — a) tends in probability under P to the 1 — a quantile of the limiting 
distribution of R n under P. Because our results are uniform in P G P, we 
must consider the behavior of R n and c n (l — a) under arbitrary sequences 
{P n G P:n > 1}, under which the quantile estimators need not even settle 
down. Thus, the results are not trivial extensions of the usual pointwise 
asymptotic arguments. 

The construction of c n (a) satisfying (1) is useful for constructing confi- 
dence regions that behave well uniformly over P. More precisely, our results 
provide conditions under which subsampling and the bootstrap can be used 
to construct confidence regions C n = C n (X^) of level 1 — a for a parameter 
0(P) that are uniformly consistent in level in the sense that 

(2) liminf inf P{B(P) G C n } > 1 -a. 

Our results are also useful for constructing tests <j) n = (p n (X^) of level a 
for a null hypothesis P G Po Q P against the alternative P G Pi = P \ Po 
that are uniformly consistent in level in the sense that 



(3) 



limsup sup Ep[(p n ] < a. 

n->oo PgPq 
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In some cases, it is possible to replace the liminf n _ >00 and > in (2) or the 
lim sup n _ i>00 and < in (3) with linin^oo and =, respectively. 

Confidence regions satisfying (2) are desirable because they ensure that 
for every e > there is an N such that for n > N we have that P{9{P) £ C n } 
is no less than 1 — a — e for all P £ P. In contrast, confidence regions that 
are only pointwise consistent in level in the sense that 

liminf P{6(P) £ C n } > 1 - a 

n—too 

for each fixed P £ P have the feature that there exists some e > and 
{P n £ P : n > 1} such that P n {9(P n ) £ C n } is less than 1 — a — e infinitely 
often. Likewise, tests satisfying (3) are desirable for analogous reasons. For 
this reason, inferences based on confidence regions or tests that fail to satisfy 
(2) or (3) may be very misleading in finite samples. Of course, as pointed 
out by Bahadur and Savage (1956), there may be no nontrivial confidence 
region or test satisfying (2) or (3) when P is sufficiently rich. For this reason, 
we will have to restrict P appropriately in our examples. In the case of 
confidence regions for or tests about the mean, for instance, we will have to 
impose a very weak uniform integrability condition. See also Kabaila (1995), 
Potscher (2002), Leeb and Potscher (2006a, 2006b), Potscher (2009) for 
related results in more complicated settings, including post-model selection, 
shrinkage-estimators and ill-posed problems. 

Some of our results on subsampling are closely related to results in An- 
drews and Guggenberger (2010), which were developed independently and 
at about the same time as our results. See the discussion on page 431 of 
Andrews and Guggenberger (2010). Our results show that the question of 
whether subsampling can be used to construct estimators c n (a) satisfying 
(1) reduces to a single, succinct requirement on the asymptotic relation- 
ship between the distribution of J n (x,P) and Jb(x,P), where b is the sub- 
sample size, whereas the results of Andrews and Guggenberger (2010) re- 
quire the verification of a larger number of conditions. Moreover, we also 
provide a converse, showing this requirement on the asymptotic relation- 
ship between the distribution of J n (x,P) and Jf,(x,P) is also necessary in 
the sense that, if the requirement fails, then for some nominal coverage 
level, the uniform coverage statements fail. Thus our results are stated un- 
der essentially the weakest possible conditions, yet are verifiable in a large 
class of examples. On the other hand, the results of Andrews and Guggen- 
berger (2010) further provide a means of calculating the limiting value of 
infpgp P{c n {a{) < R n < c n (l — c^)} in the case where it may not satisfy 
(1). To the best of our knowledge, our results on the bootstrap are the first 
to be stated at this level of generality. An important antecedent is Romano 
(1989), who studies the uniform asymptotic behavior of confidence regions 
for a univariate cumulative distribution function. See also Mikusheva (2007), 



4 



J. P. ROMANO AND A. M. SHAIKH 



who analyzes the uniform asymptotic behavior of some tests that arise in 
the context of an autoregressive model. 

The remainder of the paper is organized as follows. In Section 2, we 
present the conditions under which Cn(a) satisfying (1) may be constructed 
using subsampling or the bootstrap. We then provide in Section 3 several ap- 
plications of our general results. These applications include the multivariate 
mean, testing moment inequalities, multiple testing, the empirical process 
and U -statistics. The discussion of [/-statistics is especially noteworthy be- 
cause it highlights the fact that the assumptions required for the uniform 
asymptotic validity of subsampling and the bootstrap may differ. In partic- 
ular, subsampling may be uniformly asymptotically valid under conditions 
where, as noted by Bickel and Freedman (1981), the bootstrap fails even 
to be pointwise asymptotically valid. The application to multiple testing 
is also noteworthy because, despite the enormous recent literature in this 
area, our results appear to be the first that provide uniformly asymptoti- 
cally valid inference. Proofs of the main results (Theorems 2.1 and 2.4) can 
be found in the Appendix; proofs of all other results can be found in Romano 
and Shaikh (2012), which contains supplementary material. Many of the in- 
termediate results may be of independent interest, including uniform weak 
laws of large numbers for [/-statistics and F-statistics [Lemmas S.17.3 and 
S.17.4 in Romano and Shaikh (2012), resp.] as well as the aforementioned 
Lemma A.l. 

2. General results. 

2.1. Subsampling. Let = (X\, . . . ,X n ) be an i.i.d. sequence of ran- 
dom variables with distribution P G P. Denote by J n (x,P) the distribution 
of a real- valued root R n = R n {X^ n \P) under P. The goal is to construct 
procedures which are valid uniformly in P. In order to describe the sub- 
sampling approach to approximate J n (x,P), let b = b n < n be a sequence 
of positive integers tending to infinity, but satisfying b/n — > 0, and define 
N n = ("). For i = l,...,N n , denote by X n ^ the ith subset of data of 
size b. Below, we present results for two subsampling-based estimators of 
J n (x,P). We first consider the estimator given by 



More generally, we will also consider feasible estimators L n {x) in which Rb 
is replaced by some estimator Rf,, that is, 



(4) 



L n (x,P) = — I{Rb{X n ^\P)<x}. 




(5) 



at ^ i{Mx n ^)<x}. 




UNIFORM ASYMPTOTIC VALIDITY 



5 



Typically, Rb(-) = Rb(-,P n ), where P n is the empirical distribution, but this 
is not assumed below. Even though the estimator of J n (x,P) defined in (4) 
is infeasible because of its dependence on P, which is unknown, it is useful 
both as an intermediate step toward establishing some results for the feasible 
estimator of J n (x,P) and, as explained in Remarks 2.2 and 2.3, on its own 
in the construction of some feasible tests and confidence regions. 

Theorem 2.1. Let b = b n < n be a sequence of positive integers tending 
to infinity, but satisfying b/n—>0, and define L n (x,P) as in (4). Then, the 
following statements are true: 

(i) J/Umsup n _j. 00 sup P6P sup x6R {J 6 (a:,P) - J n (x,P)} < 0, then 
(6) liminf inf PfL^iat, P) < R n < L~ l {l - a 2 , P)} > 1 - at\ - a 2 

n-loo PgP 

holds for ot\ = and any < a 2 < 1 . 

(ii) J/limsup n ^^ supp gP sup xgR { J n (x, P) — Jf,(x, P)} < 0, then (6) holds 
for a 2 = and any < a\ < 1. 

(iii) If lim^-^oo supp gP sup xg p | Jb(x, P) — J n (x, P)| = 0, then (6) holds 
for any a\ > and a 2 > satisfying < a± + a 2 < 1. 

Remark 2.1. It is typically easy to deduce from the conclusions of 
Theorem 2.1 stronger results in which the liminfj^oo and > in (6) are 
replaced by limn^oo and =, respectively. For example, in order to assert that 
(6) holds with liminfn^oo and > replaced by linif^oo and =, respectively, 
all that is required is that 

lim P{L-\ ai ,P) < Rn < L- X (l -a 2 ,P)} = l- ai -a 2 

n— >oo 

for some P € P. This can be verified using the usual arguments for the 
pointwise asymptotic validity of subsampling. Indeed, it suffices to show for 
some P € P that J n (x,P) tends in distribution to a limiting distribution 
J(x, P) that is continuous at the appropriate quantiles. See Politis, Romano 
and Wolf (1999) for details. 

Remark 2.2. As mentioned earlier, L n (x,P) defined in (4) is infeasi- 
ble because it still depends on P, which is unknown, through Rb(X n '^' l ,P). 
Even so, Theorem 2.1 may be used without modification to construct feasible 
confidence regions for a parameter of interest 6(P) provided that R n (X^ , P), 
and therefore L n (x,P), depends on P only through 0(P). If this is the 
case, then one may simply invert tests of the null hypotheses 9{P) = 6 for 
all 9 £ to construct a confidence region for 0(P). More concretely, sup- 
pose R n (X( n \P) = R n (X^ n \6(P)) and L n (x, P) = L n (x,6(P)). Whenever 
we may apply part (i) of Theorem 2.1, we have that 

C n = {9 £ 6 : Rn(XW , 9) < L-\l - a, 9)} 
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satisfies (2). Similar conclusions follow from parts (ii) and (iii) of Theo- 
rem 2.1. 

Remark 2.3. It is worth emphasizing that even though Theorem 2.1 
is stated for roots, it is, of course, applicable in the special case where 
R n {X^ n \P) =T n (XW). This is especially useful in the context of hypoth- 
esis testing. See Example 3.3 for one such instance. 

Next, we provide some results for feasible estimators of J n (x, P). The first 
result, Corollary 2.1, handles the case of the most basic root, while Theo- 
rem 2.2 applies to more general roots needed for many of our applications. 

Corollary 2.1. Suppose R n = R n (X^ n \P) = T n (9 n -6(P)), where {r n G 
R:n > 1} is a sequence of normalizing constants, 9(P) is a real-valued pa- 
rameter of interest and 8 n = 9 n (X^) is an estimator of 6{P). Let b = b n <n 
be a sequence of positive integers tending to infinity, but satisfying b/n — > 0, 
and define 

L n (x) = ^- V i{n(e b (x n ^)-§ n )<x}. 



II 



l<i<N n 

Then statements (i)-(iii) of Theorem 2.1 hold when L~ l {-,P) is replaced by 



Theorem 2.2. Let b = b n <n be a sequence of positive integers tending 
to infinity, but satisfying b/n— >0. Define L n (x,P) as in (4) and L n (x) as 
in (5). Suppose for all e > that 

(7) supPf sup|L„(a;) - L n (x,P)\ > e\ 0. 

Then, statements (i)-(iii) of Theorem 2.1 hold when L~ 1 (-,i- > ) is replaced by 

As a special case, Theorem 2.2 can be applied to Studentized roots. 
Corollary 2.2. Suppose 

R n = R n (xW,P) = TA : m \ 

0~ n 

where {r n S R:n > 1} is a sequence of normalizing constants, 6{P) is a real- 
valued parameter of interest, and 6 n = 6 n (X^) is an estimator of 6{P), and 
&n = o'n(X^) > is an estimator of some parameter cr(P) > 0. Suppose 
further that: 

(i) The family of distributions {J n (x, P):n> 1, P G P} is tight, and any 
subsequential limiting distribution is continuous. 
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(ii) For any e > 0, 

SUp -T 

PeP 



1 



>ey->0. 



<r(P) 

Ze£ b = b n < n be a sequence of positive integers tending to infinity, but 
satisfying b/n — > and Tb/r n — > 0. Define 

" () ^,4,i J' 

TTien statements (i)-(iii) of Theorem 2.1 hold when L~ l {-,P) is replaced by 

Remark 2.4. One can take a n = a(P) in Corollary 2.2. Since cr(P) ef- 
fectively cancels out from both sides of the inequality in the event {R n < 
L~ 1 (l — a)}, such a root actually leads to a computationally feasible con- 
struction. However, Corollary 2.2 still applies and shows that we can obtain 
a positive result without the correction factor r n /(r n + Tb) present in Corol- 
lary 2.1, provided the conditions of Corollary 2.2 hold. For example, if for 
some (j(P), we have that T n (6 n — 9{P n )) / a{P n ) is asymptotically standard 
normal under any sequence {P n GP:n>l}, then the conditions hold. 

Remark 2.5. In Corollaries 2.1 and 2.2, it is assumed that the rate of 
convergence r n is known. This assumption may be relaxed using techniques 
described in Politis, Romano and Wolf (1999). 

We conclude this section with a result that establishes a converse for 
Theorems 2.1 and 2.2. 

Theorem 2.3. Let b = b n <n be a sequence of positive integers tending 
to infinity, but satisfying b/n—^0 and define L n (x,P) as in (4) and L n (x) 
as in (5). Then the following statements are true: 

(i) I/limsup.^oc supp 6 p sup xeR { Jb(x, P) — J n {x, P)} > 0, then (6) fails 
for ol\ = and some < oli < 1. 

(ii) //limsup n ^. 0O supp gP sup xgR { J n (x, P) — Jb(x, P)} > 0, then (6) fails 
for o<2 = and some < a\ < 1 . 

(hi) 7/liminf n _ >00 supp g pSup :reR |Jfe(x,P) - J n (x,P)\ > 0, then (6) fails 
for some a\ > and at2 > satisfying < a\ + ai < 1. 

If, in addition, (7) holds for any e > 0, then statements (i)-(iii) above hold 
when L~ 1 (-,P) is replaced by L~ l {-). 

2.2. Bootstrap. As before, let = (X±,. . -,X n ) be an i.i.d. sequence 
of random variables with distribution P £ P. Denote by J n (x,P) the distri- 
bution of a real- valued root R n = R n (X^ n \ P) under P. The goal remains 
to construct procedures which are valid uniformly in P. The bootstrap ap- 
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proach is to approximate J n (-,P) by J n (-,P n ) for some estimator P n of P. 
Typically, P n is the empirical distribution, but this is not assumed in The- 
orem 2.4 below. Because P n need not a priori even lie in P, it is necessary 
to introduce a family P' in which P n lies (at least with high probability). 
In order for the bootstrap to succeed, we will require that p(P n ,P) be small 
for some function (perhaps a metric) p(-,-) defined on P'xP, For any given 
problem in which the theorem is applied, P, P' and p must be specified. 

Theorem 2.4. Let p(v) be a function on P' x P, and let P n be a 
(random) sequence of distributions. Then, the following are true: 

(i) Suppose limsup^oo sup xeR { J n (x,Q n ) - J n (x,P n )} < for any se- 
quences {Q n £P':n> 1} and {P n G P : n > 1} satisfying p(Q n ,Pn) — > 0. // 

(8) p(P n ,P n )^0 and P n {P n eP'}^l 
for any sequence {P n G P : n > 1}, then 

(9) liminf inf P{J- 1 (a 1 ,P n ) < R n < J' 1 (I - a 2 ,P n )} >l-a 1 -a 2 

n—toc PgP 

holds for a\ = and any < a 2 < 1. 

(ii) Suppose limsup^^ sup xgR { J n (x, P n ) - J n (x,Q n )} < for any se- 
quences {Q n G P' : n > 1} and {P n G P : n > 1} satisfying p(Q n , Pn) —tQ. If 

(8) holds for any sequence {P n G P :n > 1}, then (9) holds for a 2 = and 
any < a± < 1. 

(iii) Suppose linin^oo sup xgR | J n (x, Q n ) — J n (x, P n )\ = for any sequences 
{Q n G P' : n > 1} and {P n G P : n > 1} satisfying p(Q n ,P n ) ->0. If (8) holds 
for any sequence {P n G P : n > 1}, then (9) holds for any a\ > and a 2 > 
satisfying < a\ + a 2 < 1. 

Remark 2.6. It is typically easy to deduce from the conclusions of 
Theorem 2.4 stronger results in which the liminf n _>. 00 and > in (9) are 
replaced by lim n _ i , 00 and =, respectively. For example, in order to assert that 

(9) holds with liminfj^oo and > replaced by lim n _ i , 00 and =, respectively, 
all that is required is that 

lim P{J^ 1 (a 1 ,P n ) <R n < J~ X (1 - a 2 ,P n )} = 1 - a\ - a 2 

n— >oo 

for some P G P. This can be verified using the usual arguments for the 
pointwise asymptotic validity of the bootstrap. See Politis, Romano and 
Wolf (1999) for details. 

Remark 2.7. In some cases, it is possible to construct estimators J n {x) 
of J n (x,P) that are uniformly consistent over a large class of distributions 
P in the sense that for any e > 

(10) su P P{p(J n (-),J n (;P))>e}^0, 

PGP 



UNIFORM ASYMPTOTIC VALIDITY 



9 



where p is the Levy metric or some other metric compatible with the weak 
topology. Yet a result such as (10) is not strong enough to yield uniform 
coverage statements such as those in Theorems 2.1 and 2.4. In other words, 
such conclusions do not follow from uniform approximations of the distri- 
bution of interest if the quality of the approximation is measured in terms 
of metrics metrizing weak convergence. To see this, consider the following 
simple example. 

Example 2.1. Let xW = (X 1 ,...,X n ) be an i.i.d. sequence of random 
variables with distribution Pg = Bernoulli(#). Denote by J n (x,Pg) the dis- 
tribution of the root R n = \fn(9 n — 9) under Pg, where 9 n = X n . Let P n 
be the empirical distribution of or, equivalently, Pg . Lemma S.l.l in 
Romano and Shaikh (2012) implies for any e > that 

(11) sup Pg{p(J n (-,P n ),J n (-,Pg))>e}^0, 

o<e<i 

whenever p is a metric compatible with the weak topology. Nevertheless, it 
follows from the argument on page 78 of Romano (1989) that the coverage 
statements in Theorem 2.4 fail to hold provided that both ol\ and «2 do 
not equal zero. Indeed, consider part (i) of Theorem 2.4. Suppose ol\ = 
and < «2 < 1- For a given n and S > 0, let 6 n = (1 — d) 1 ^. Under Pg n , the 
event X\ = ■ ■ ■ = X n = 1 has probability 1 — 5. Moreover, whenever such an 
event occurs, R n > J n ' 1 (l — a.2,P n ) = 0. Therefore, Pg n {Jn 1 («i ) Pn) < Rn < 
(1 — a 2,Pn)} < 8. Since the choice of 5 was arbitrary, it follows that 

liminf inf Pg^" 1 (a 1} P n ) < R n < J" 1 ^ - a 2 , P n )} = 0. 
n-too 0<6<1 

A similar argument establishes the result for parts (ii) and (iii) of Theo- 
rem 2.4. 

On the other hand, when p is the Kolmogorov metric, (11) holds when the 
supremum over < 9 < 1 is replaced with a supremum over 5 < 9 < 1 — 8 for 
some 5 > 0. Moreover, when 9 is restricted to such an interval, the coverage 
statements in Theorem 2.4 hold as well. 

3. Applications. Before proceeding, it is useful to introduce some nota- 
tion that will be used frequently throughout many of the examples below. 
For a distribution P on R fc , denote by p(P) the mean of P, by S(P) the co- 
variance matrix of P, and by £l(P) the correlation matrix of P. For 1 < j < k, 
denote by Pj(P) the jth component of p(P) and by &j(P) the jth diagonal 
element of S(P). In all of our examples, X^ = (X±, . . . , X n ) will be an i.i.d. 

sequence of random variables with distribution P and P n will denote the 
empirical distribution of I". As usual, we will denote by X n = p(P n ) the 
usual sample mean, by E n = Yi(P n ) the usual sample covariance matrix and 
by Ci n = £l(P n ) the usual sample correlation matrix. For 1 < j ' < k, denote 



10 J. P. ROMANO AND A. M. SHAIKH 

by Xj >n the jth component of X n and by Sj n the jth diagonal element of 

S n . Finally, we say that a family of distributions Q on the real line satisfies 
the standardized uniform integrability condition if 

T-/x(Q) \ 2 

<Q) ) J 



(12) lim sup Eq 



Y - n{Q) 



> A 



0. 



In the preceding expression, Y denotes a random variable with distribution 
Q. The use of the term standardized to describe (12) reflects that fact that 
the variable Y is centered around its mean and normalized by its standard 
deviation. 

3.1. Subsampling. 

Example 3.1 (Multivariate nonpar ametric mean). Let X^> = (Xi, . . . , 
X n ) be an i.i.d. sequence of random variables with distribution P € P on 
R fc . Suppose one wishes to construct a rectangular confidence region for 
/x(.P). For this purpose, a natural choice of root is 

(13) R n (X^,P) = max ^% ~ . 

Sj tTl 

In this setup, we have the following theorem: 

Theorem 3.1. Denote by Pj the set of distributions formed from the jth 
marginal distributions of the distributions in P. Suppose P is such that (12) 
is satisfied with Q = Pj for all l<j<k. Let J n (x,P) be the distribution 
of the root (13). Let b = b n <n be a sequence of positive integers tending to 
infinity, but satisfying b/n^-0 and define L n {x,P) by (4)- Then 

lim mt p{L-\a u P) < max ^(X ~ H (P)) ^ _ j 

n-s-ooPeP I l<7<fc Jin 

(14) 

= 1 — Ct\ — Cl2 

for any a.\ > and ot2 > such that < a\ + a% < 1. Furthermore, (14) 
remains true if L~ (-,P) is replaced by L~ l (-), where L n {x) is defined by 
(5) with R b (X n ^' i ) = R b (X n ^' i ,P n ). 

Under suitable restrictions, Theorem 3.1 generalizes to the case where the 
root is given by 

(15) R n (X^,P) = f(Z n (P),Cl n ), 
where / is a continuous, real-valued function and 

(16) Z n {P) = [ 

\ >->l,n *->k,n 

In particular, we have the following theorem: 
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Theorem 3.2. Let P be defined as in Theorem 3.1. Let J n (x,P) be the 
distribution of root (15), where f is continuous. 

(i) Suppose further that for all x 6R that 

(17) P n {f{Z n (P n ), Q(P n )) <x}^ P{f(Z, fi) < x}, 

(18) P n {f(Z n (P n ), Q(P n )) <x}^ P{f(Z, Q) < x} 

for any sequence {P n _ P:n > 1} such that Z n {P n ) -A Z under P n and 
n(P n )^Q, where Z ~ _V(0,fi). Then 

liminf inf PiL'^a^P) < /(Z n (P)A) < L~\\ - a 2 ,P)} 

n— >oo PgP 

(19) 

> 1 — ai — 02 

for any a\ > and a% > suc/i £/ia£ < «i + «2 < 1 ■ 

(ii) Suppose further that if Z ~ -/(O, 0) /or some f2 satisfying Qj j = 1 
/or a// 1 < j < k, then f(Z,£l) is continuously distributed. Then, (19) re- 
mains true if L~ l (-,P) is replaced by L~ l (-), where L n (x) is defined by (5) 
with R h (X n ' ( t'^) = R b (X n ^'\P n ). Moreover, the Hminf „_><-, and > may be 
replaced by lim.n_5.00 and =, respectively. 

In order to verify (17) and (18) in Theorem 3.2, it suffices to assume that 
f(Z,Q) is continuously distributed. Under the assumptions of the theorem, 
however, f(Z, Q) need not be continuously distributed. In this case, (17) and 
(18) hold immediately for any x at which P{(Z, £1) < x} is continuous, but 
require a further argument for x at which P{(Z, S_) < x} is discontinuous. 
See, for example, the proof of Theorem 3.9, which relies on Theorem 3.8, 
where the same requirement appears. 

Example 3.2 (Constrained univariate nonparametric mean). Andrews 
(2000) considers the following example. Let = (X±, . . . ,X n ) be an i.i.d. 
sequence of random variables with distribution P £ P on R. Suppose it is 
known that fi(P) > for all P S P and one wishes to construct a confidence 
interval for fJ-(P). A natural choice of root in this case is 

R n = R n {X^ n \P) = v^(max{X n ,0} - fj,(P)). 

This root differs from the one considered in Theorem 3.1 and the ones dis- 
cussed in Theorem 3.2 in the sense that under weak assumptions on P, 

(20) limsupsup sup{Jb(x, P) — J n (x,P)} < 

n-i>oo PePxgR 

holds, but 

(21) limsup sup sup{ J n (x, P) — Jb(x, P)} < 

n-s-oo PePzeR. 
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fails to hold. To see this, suppose (12) holds with Q = P. Note that 

J b (x,P)=P{m a x{Z b (P),-Vb f i(P)}<x}, 
J n (x, P) = P{max{Z n (P), -yfcuiP)} < x}, 

where Z b (P) = Vb(X b - //(P)) and Z n {P) = ^n(X n - fi(P)) . Since y/by.{P) < 
y / n / u(i- > ) for any P £ P, J b (x, P) — J n (x, P) is bounded from above by 

P{max{Z 6 (P),-^/i(P)}<x} - J n {x,P). 

It now follows from the uniform central limit theorem established by Lem- 
ma 3.3.1 of Romano and Shaikh (2008) and Theorem 2.11 of Bhattacharya 
and Ranga Rao (1976) that (20) holds. It therefore follows from Theorem 2.1 
that (6) holds with a\ = and any < a% < 1. To see that (21) fails, suppose 
further that {Q n : n > 1} C P, where Q n = N(h/y/n, 1) for some h > 0. For 
Z~N(0,1), 

J n (x, Q n ) = P{max(Z, -h) < x}, 
Jb(%, Qn) = P{max(Z, —hVb/y/n) < x}. 
The left-hand side of (21) is therefore greater than or equal to 

limsup(P{max(Z, —h) < x} — P{max(Z, —hVb/y/n) < x}) 

n—>oo 

for any x. In particular, if — h < x < 0, then the second term is zero for large 
enough n, and so the limiting value is P{Z < x} = 3>(x) > 0. It therefore 
follows from Theorem 2.3 that (6) fails for ai = and some < a\ < 1. On 
the other hand, (6) holds with a<i = and any 0.5 < a\ < 1. To see this, 
consider any sequence {P n £ P : n > 1} and the event {L^ai, P n ) < R n }. 
For the root in this example, this event is scale invariant. So, in calculat- 
ing the probability of this event, we may without loss of generality assume 
cr 2 (P n ) = 1. Since n{P n ) > 0, we have for any x > that 

J n (x, P n ) = P{max{Z n (P n ), -x/^(P n )} <X} = P{Z n (P n ) < x} -> $(x) 

and similarly for J b (x,P n )- Using the usual subsampling arguments, it is 
thus possible to show for 0.5 < a± < 1 that 

L-Vl^n^-Vl)- 

The desired conclusion therefore follows from Slutsky's theorem. Arguing as 
the the proof of Corollary 2.2 and Remark 2.4, it can be shown that the same 
results hold when L~ 1 (-,P) is replaced by L~ l {-), where L n {x) is defined as 
L n (x,P) is defined but with /u(P) replaced by X n . 
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Example 3.3 (Moment inequalities). The generality of Theorem 2.1 
illustrated in Example 3.2 is also useful when testing multisided hypothe- 
ses about the mean. To see this, let A» = (Xi,...,X n ) be an i.i.d. se- 
quence of random variables with distribution P G P on R fc . Define Po = 
{P G P : fJ,(P) < 0} and Pi = P \ Po- Consider testing the null hypothesis 
that P G Po versus the alternative hypothesis that P G Pi at level a G (0, 1). 
Such hypothesis testing problems have recently received considerable atten- 
tion in the "moment inequality" literature in econometrics. See, for example, 
Andrews and Soares (2010), Andrews and Guggenberger (2010), Andrews 
and Barwick (2012), Bugni (2010), Canay (2010) and Romano and Shaikh 
(2008, 2010). Theorem 2.1 may be used to construct tests that are uniformly 
consistent in level in the sense that (3) holds under weak assumptions on P. 
Formally, we have the following theorem: 

Theorem 3.3. Let P be defined as in Theorem 3.1. Let J n (x,P) be the 
distribution of 

T„(lW) = max 

Let b = b n < n be a sequence of positive integers tending to infinity, but 
satisfying b/n — > and define L n (x) by the right-hand side of (4) with 
R n {X [ - n \P) = T n {X^). Then, the test defined by 

MX (n) ) = I{Tn(xW) > L~\l - a)} 
satisfies (3) for any < a < 1. 

The argument used to establish Theorem 3.3 is essentially the same as 
the one presented in Romano and Shaikh (2008) for 

T n {X^)= max{^X j>n ,0} 2 , 

though Lemma S.6.1 in Romano and Shaikh (2012) is needed for establishing 
(20) here because of Studentization. Related results are obtained by Andrews 
and Guggenberger (2009). 

Example 3.4 (Multiple testing). We now illustrate the use of Theo- 
rem 2.1 to construct tests of multiple hypotheses that behave well uniformly 
over a large class of distributions. Let = (Xi,...,X n ) be an i.i.d. se- 
quence of random variables with distribution P G P on R fc , and consider 
testing the family of null hypotheses 

(22) Hj : n s (P) < for 1 < j < k 
versus the alternative hypotheses 

(23) Hj : li j (P) > for 1 < j < k 
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in a way that controls the familywise error rate at level < a < 1 in the 
sense that 

(24) limsup sup FWER P < a, 

n— >oo PgP 

where 

FWERp = P{reject some Hj with Hj(P) < 0}. 

For K C {1, . . . , k}, define L n (x,K) according to the right-hand side of (4) 
with 

R n (X^,P) = m^^^, 
and consider the following stepwise multiple testing procedure: 
Algorithm 3.1. Step 1: Set K x ={l,...,fc}. If 
max J < L n (l-a,^), 
then stop. Otherwise, reject any Hj with 

^m >L -\l-a,K 1 ) 
and continue to Step 2 with 



Step s: If 



max - < L n (1 - a, K s ), 
3&k s aj >n 



then stop. Otherwise, reject any Hj with 



^^>L-\l-a,K s ) 
and continue to Step s + 1 with 

K s+1 = S [j eK s :^^<L-\l-a,K s )y 
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We have the following theorem: 

Theorem 3.4. Let P be defined as in Theorem 3.1. Let b = b n <n be 
a sequence of positive integers tending to infinity, but satisfying b/n — > 0. 
Then, Algorithm 3.1 satisfies 

(25) limsupsupFWER P <a 

n— Kx) PgP 

for any < a < 1 . 

It is, of course, possible to extend the analysis in a straightforward way 
to two-sided testing. See also Romano and Shaikh (2010) for related re- 
sults about a multiple testing problem involving an infinite number of null 
hypotheses. 

Example 3.5 (Empirical process on R). Let = (X\, . . . ,X n ) be an 
i.i.d. sequence of random variables with distribution P € P on R. Suppose 
one wishes to construct a confidence region for the cumulative distribution 
function associated with P, that is, P{(—oo,t]}. For this purpose a natural 
choice of root is 

(26) supVn~\P n {(-oo,t]}-P{(-oo,t]}\. 
ten 

In this setting, we have the following theorem: 

Theorem 3.5. Fix any e G (0, 1), and let 

(27) P = {P onH:e<P{{-co,t]}<l-e for someteK}. 
Let J n (x,P) be the distribution of root (26). Then 

lim inf p\L~ l ( ai ,P) < sup^\P n {(-oo,t]} - P{(-oo,t]}\ 

(28) <L~ 1 (l-a 2 ,P)} 
= 1 — a± — ct2 

for any a.\ > and «2 > such that < a± + ct2 < 1. Furthermore, (28) 
remains true if L~ l (-,P) is replaced by L~ l (-), where L n {x) is defined by 
(5) with R b {X n ' { - b ^ i ) = R b (X n ^> i ,P n ). 

Example 3.6 (One sample [/-statistics). Let X^ = (Xi, . . . , X n ) be an 
i.i.d. sequence of random variables with distribution P G P on R. Suppose 
one wishes to construct a confidence region for 



(29) 



9(P) = 9 h (P) = Ep[h(X 1 ,...,X m )}, 
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where h is a symmetric kernel of degree m. The usual estimator of 0(P) in 
this case is given by the [/-statistic 



&n — &n (X 



-^-^2h(X h ,. . . ,Xi 

\mJ c 



Here, ^ c denotes summation over all (™) subsets {i±, . . . ,i m } of {1, . . . , n}. 
A natural choice of root is therefore given by 

(30) R n (X^,P) = ^(9n-0(P))- 
In this setting, we have the following theorem: 

Theorem 3.6. Let 

(31) g(x, P) = g h (x, P) = E P [h(x, X 2 ,...,X m )}- 9(P) 
and 

(32) a 2 h (P)=m 2 Y & v P [g(X i ,P)]. 
Suppose P satisfies the uniform integrability condition 

-g 2 (Xi,P) T ( g{X u P) 



lim sup Ep 

A— ¥oo pgp 



V a r P [h(X 1 ,...,X m )} 



(33) 
and 

(34) sup J " 2 y p , ' "" J < oo. 

p eP o- 2 {P) 

Let J n (x, P) be the distribution of the root (30). Let b = b n <n be a sequence 
of positive integers tending to infinity, but satisfying b/n — > 0, and define 
L n (x,P) by (4). Then 



(35) 



lim inf P{L-\ ail P)<V^{dn-0(P))<L~ l (l-a 2 ,P)} 

n— >oo PeP 

= 1 — a\ — a 2 



for any a\ > and a 2 > such that < ax + a 2 < 1. Furthermore, (35) 
remains true if L~ l {-,P) is replaced by L~ l {-), where L n {x) is defined by 
(5) with R b {X n ^' i ) = R b {X n ^\P n ). 

3.2. Bootstrap. 

Example 3.7 (Multivariate nonparametric mean). Let X^ = (Xi, . . . , 
X n ) be an i.i.d. sequence of random variables with distribution PgPon R fc . 
Suppose one wishes to construct a rectangular confidence region for /x(P). 
As described in Example 3.1, a natural choice of root in this case is given 
by (13). In this setting, we have the following theorem, which is a bootstrap 
counterpart to Theorem 3.1: 
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Theorem 3.7. Let P be defined as in Theorem 3.1. Let J n (x,P) be the 
distribution of the root (13). Then 

lim inf p{j-\a u P n )< ma* ^^>~ < J~\l - a 2 , P n )\ 

rn-oopgp [ l<j<k bj n J 

(36) 

= 1 — «1 — «2 

for any a\ > and 02 > such that < ai + «2 < 1 • 

Theorem 3.7 generalizes in the same way that Theorem 3.1 generalizes. 
In particular, we have the following result: 

Theorem 3.8. Let P be defined as in Theorem 3.1. Let J n (x,P) be the 
distribution of the root (15). Suppose f is continuous. Suppose further that 
for all x G R 

(37) P n {f(Z n (P n ), Sl(P n )) <x}^ P{f(Z, Sl)<x}, 

(38) P n {f(Z n (P n )MPn)) <x}^ P{f(Z, SI) < x} 

for any sequence {P n G P:n > 1} such that Z n {P n ) Z under P n and 
Sl(P n )^Sl, where Z~N{0,SI). Then 

liminf inf P{ J~Vl, Pn) < f(Z n (P), Sl n ) < J~\l - a 2 , P n )} 

n— ■fee, PeP 

(39) 

> 1 — Ot\ — Oi2 

for any a\ > and 02 > such that < at\ + 02 < 1 • 

Example 3.8 (Moment inequalities). Let X^> = (Xi,...,X n ) be an 
i.i.d. sequence of random variables with distribution P G P on R fc and define 
Po and Pi as in Example 3.3. Andrews and Barwick (2012) propose testing 
the null hypothesis that P G Po versus the alternative hypothesis that P G 
Pi at level a G (0, 1) using an "adjusted quasi-likelihood ratio" statistic 
T n (X^) defined as follows: 

T„(XM)= inf W n {t)'Sl- x W n {t). 
Here, t < is understood to mean that the inequality holds component- wise, 



W n (t) 



f y/n(X ljn - ti) y/n(X kjn -t k ) \' 



and 

(40) Sl n = max{e - det(Sl n ), 0}4 + &r. 
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where e > and Ik is the /c-dimensional identity matrix. Andrews and Bar- 
wick (2012) propose a procedure for constructing critical values for T n (X^) 
that they term "refined moment selection." For illustrative purposes, we in- 
stead consider in the following theorem a simpler construction. 

Theorem 3.9. Let P be defined as in Theorem 3.1. Let J n (x,P) be the 
distribution of the root 

(41) R n (x( n \P)= inf (Z^-tyn-^Z^-t), 

where Z n (P) is defined as in (16). Then, the test defined by 

MX^) = I{Tn{X (n) ) > Jn\^ ~ ^ Pn)} 

satisfies (3) for any < a < 1. 

Theorem 3.9 generalizes in a straightforward fashion to other choices of 
test statistics, including the one used in Theorem 3.3. On the other hand, 
even when the underlying choice of test statistic is the same, the first-order 
asymptotic properties of the tests in Theorems 3.9 and 3.3 will differ. For 
other ways of constructing critical values that are more similar to the con- 
struction given in Andrews and Barwick (2012), see Romano, Shaikh and 
Wolf (2012). 

Example 3.9 (Multiple testing). Theorem 2.4 may be used in the same 
way that Theorem 2.1 was used in Example 3.4 to construct tests of multiple 
hypotheses that behave well uniformly over a large class of distributions. To 
sec this, let = (X 1 ,...,X n ) be an i.i.d. sequence of random variables 
with distribution P £ P on R fc , and again consider testing the family of null 
hypotheses (22) versus the alternative hypotheses (23) in a way that satisfies 
(24) for a£ (0,1). For K C {1, . . . , k}, let J n (x,K,P) be the distribution of 
the root 

under P, and consider the stepwise multiple testing procedure given by 
Algorithm 3.1 with — a,Kj) replaced by (1 — a,Kj,P n ). We have 

the following theorem, which is a bootstrap counterpart to Theorem 3.4: 

Theorem 3.10. Let P be defined as in Theorem 3.1. Then Algorithm 3.1 
with L~ l {\ — a,Kj) replaced by J r 7 x (l — a,Kj,P n ) satisfies (25) for any 
< a < I. 

It is, of course, possible to extend the analysis in a straightforward way 
to two-sided testing. 
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Example 3.10 (Empirical process on R). Let = (X\,...,X n ) be 
an i.i.d. sequence of random variables with distribution P £ P on R. Sup- 
pose one wishes to construct a confidence region for the cumulative dis- 
tribution function associated with P, that is, P{(— oo,i]}. As described in 
Example 3.5, a natural choice of root in this case is given by (26). In this 
setting, we have the following theorem, which is a bootstrap counterpart to 
Theorem 3.5: 

Theorem 3.11. Fix any eG (0,1), and let P be defined as in Theo- 
rem 3.5. Let J n (x,P) be the distribution of the root (26). Denote by P n the 
empirical distribution of X^ . Then 

lim inf p{j- 1 (a 1 , J R„)<sup^|A i {(-oo,t]}-P{(-oo,t]}| 

< J- 1 (l-a 2 ,P n )} 

= 1 — ai — «2 
for any a\ > and a% > such that < ot\ + «2 < 1 • 

Some of the conclusions of Theorem 3.11 can be found in Romano (1989), 
though the method of proof given in Romano and Shaikh (2012) is quite 
different. 

Example 3.11 (One sample [/-statistics). Let X^ = (X\, . . . ,X n ) be 
an i.i.d. sequence of random variables with distribution P € P on R and let 
h be a symmetric kernel of degree m. Suppose one wishes to construct a con- 
fidence region for 9{P) = 0h{P) given by (29). As described in Example 3.6, 
a natural choice of root in this case is given by (30). Before proceeding, it is 
useful to introduce the following notation. For an arbitrary kernel h, e > 
and B > 0, denote by P^ £ B the set of all distributions P on R such that 

(42) E P [\~h(X 1 ,...,X m )-e~ h (P)\ £ ]<B. 

Similarly, for an arbitrary kernel h and 5 > 0, denote by 5 the set of all 
distributions P on R such that 

(43) a\{P) > 5, 

where cr~ (P) is defined as in (32). Finally, for an arbitrary kernel /i, e > 
and B > 0, let P^ ^ be the set of distributions P on R such that 

Ep[\~h(X h ,...,X i J-e~ h (P)\ £ ]<B, 

whenever 1 < ij < n for all 1 < j < m. Using this notation, we have the 
following theorem: 
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Theorem 3.12. Define the kernel h! of degree 2m according to the rule 
h(x±,. . . ,22m) = h(x±,. . .,x m )h(x 1 

j %m-{-2 j • • • j 22m ) 

(44) 

-h(xi,..., X m )h(x m+ i 2 2 m) • 

Suppose 

P Q ~Ph,2+&,B n Sh,s n P/i',i+5,b n P/^+^b 

for some 5 > and S > 0. Lei J n (x,P) be the distribution of the root R n 
defined by (30). Then 

lim inf P{J- l ( ai ,P n ) < Vn~(8 n - 9{P)) < J~\l - a 2 ,P n )} = l-a 1 -a 2 

n— >oo PgP 

for any a\ and a 2 such that < ot\ + a 2 < 1 • 

Note that the kernel h' defined in (44) arises in the analysis of the esti- 
mated variance of the [/-statistic. Note further that the conditions on P in 
Theorem 3.12 are stronger than the conditions on P in Theorem 3.6. While 
it may be possible to weaken the restrictions on P in Theorem 3.12 some, 
it is not possible to establish the conclusions of Theorem 3.12 under the 
conditions on P in Theorem 3.6. Indeed, as shown by Bickel and Freedman 
(1981), the bootstrap based on the root R n defined by (30) need not be even 
pointwise asymptotically valid under the conditions on P in Theorem 3.6. 



APPENDIX 

A.l. Proof of Theorem 2.1. 

Lemma A.l. If F and G are (nonrandom) distribution functions on R, 
then we have that: 

(i) If sup xen {G(x) -F(x)}<e, then G^(l - a 2 ) > F^(l - (a 2 + e)). 

(ii) Ifsup xen {F(x)-G(x)}<e, then G _1 (ai) < F' 1 (ai + e) . 

Furthermore, if X ~ F, it follows that: 

(iii) // sup xeR {G(2) - F{x)} < e, then P{X < G~ 1 (l - a 2 )} >l-(a 2 + 

e). 

(iv) // sup xeR {F(2) -G(x)} <e, then P{X > G' 1 ( ai )} > 1 - ( ai + e) . 

(v) Ifsup xen \G(x)-F(xj\ <§, i/ienP{G- 1 (ai)<A<G- 1 (l-a 2 )}> 
1 — (ai + a 2 + 

If G is a random distribution function on R, then we have further that: 

(vi) // P{sup xeR {G{x) - F{x)} <e}>l-5, then P{X < G' 1 ^ - 
a 2 )}>l - (a 2 + e + S). 
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(vii) If P{sup x£R {F(x) - G(x)} < e} > 1 - 5, then P{X > G~ 1 {a 1 )} > 
1 - (ax+e + 6). 

(viii) // P{su P:cgR \G(x) - F(x)\ < | } > 1 - 5, then P-fCr^ai) <X< 
G" 1 (l-a 2 )} > 1 - («i + «2 +e + 5). 

Proof. To see (i), first note that sup xgR {G(x) — F(x)} < e implies 
that 670) -e< F{x) for all ieR. Thus, {x G R:G(x) > 1 - a 2 } = {x G 
R : G(x) — e >1 — a 2 — e}C{i£R: F{x) > 1 — a 2 — e} , from which it follows 
that F" 1 (1 - (a 2 + e)) = inf {x G R : F(x) > 1 - a 2 - e} < inf {x G R : G[x) > 
1 — a 2 } = G _1 (l — a 2 ). Similarly, to prove (ii), first note that sup xgR {F(x) — 
G(x)} < e implies that F(x) — £ < G(x) for all x G R, so {x G R : F(x) > a± + 
e} = {x G R:F(x) - e > an} C {x G R:G(x) > o<i}. Therefore, G _1 (a 1 ) = 
inf{x G R:G(x) > ai} < inf{x G R:F(x) >a 1 + £} =F~ l (a 1 + e). To prove 

(iii) , note that because sup xgR {G(x) — F(x)} < e, it follows from (i) that 
{X < G~\l - a 2 )} 5 {X < - (a 2 + e))}. Hence, P{X < G _1 (l - 
02)} > < i ? ~ 1 (l — (a 2 + £■))} > 1 — (a 2 +e). Using the same reasoning, 

(iv) follows from (ii) and the assumption that sup xgR {F(x) — G(x)} < e. To 
see (v), note that 

P{G _1 (ai) < X < G _1 (l - a 2 )} > 1 - P{X < G _1 (ai)} 

-P{X>G- 1 (l-a 2 )} 

> 1 - (at\ +a 2 +e), 

where the first inequality follows from the Bonferroni inequality, and the 
second inequality follows from (iii) and (iv). To prove (vi), note that 

P{X<G-\l-a 2 )} 

> p{x < G _1 (l - a 2 ) n sup{G(x) - P(x)} < e) 

> p{x < F _1 (l - (a 2 + e)) n sup{G(x) - P(x)} < e) 

> P{X < - (a 2 + e))} - p{sup{G(x) - P(x)} > ej 
= 1 — a 2 — £ — 5, 

where the second inequality follows from (i). A similar argument using (ii) 
establishes (vii). Finally, (viii) follows from (vi) and (vii) by an argument 
analogous to the one used to establish (v). □ 

Lemma A. 2. Let iW = {X x , . . . ,X n ) be an i.i.d. sequence of random 
variables with distribution P. Denote by J n (x,P) the distribution of a real- 
valued root R n = R n (X( n \P) under P. Let N n = (?), k n = [j\ and define 
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L n (x,P) according to (4)- Then, for any e > 0, we have that 

(45) p{snp\L n (x,P)-J b (x,P)\>e}< 



1 /2vr 

£ V kn 



PROOF. Let e > be given and define S n (x, P;Xi,..., X n ) by 

^ y, nRb((x b (i-i )+ i,...,x bi ),p)<x}-j b (x,p). 



Ki<k n 



Denote by S n the symmetric group with n elements. Note that using this 
notation, we may rewrite L n (x,P) — Jb(x,P) as 

Z n (x,P; Xi,... ,X n ) = — ^ S n (x,P; X^i),- ■ ■ ,X n ^). 



7rG<S n 



Note further that 

sup\Z n (x,P;X l ,...,X n )\<— } Y sa P\ S n(x,P;X^ 1 ),...,X^ n ))\, 



7T66 n 



which is a sum of n\ identically distributed random variables. Let e > be 
given. It follows that -P{sup,j, gR \Z n (x, P;Xi,..., X n )\ > e} is bounded above 
by 



(46) 



P\— } Y1 su Pl 5 nfo P ;^r(l)i-" I >£ \- 



Using Markov's inequality, (46) can be bounded by 
(47) 



—Ep 

£ 



swp\S n (x,P;Xx, ...,X„ 



-f p\sup\S n (x,P;X 1 ,...,X n )\>u\ 
£ Jo l xgr > 



du. 



We may use the Dvoretsky-Kiefer-Wolfowitz inequality to bound the right- 
hand side of (47) by 

•l 



1 f r , 2i 2 / 27r 

- / 2expl— 2k n u \du=—\- — 
£ Jo e V K 

which establishes (45). □ 



1 /2vr 
" £ V k n ' 



Lemma A. 3. Let X^ = (X\, . . . ,X n ) be an i.i.d. sequence of random 
variables with distribution FgP. Denote by J n (x,P) the distribution of a 
real-valued root R n = R n (X^ , P) under P. Let k n = [j\ and define L n (x, P) 
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according to (4). Let 

*!,„(£, 7, -P) = — \f^ + l\sup{J b (x,P) - J n (x,P)} > (1-7)4, 
7£ V K l xeR J 

6 2 , n (e n ,P) = — Jp- + l\ su V { J n (x,P) - J b (x,P)} > (1-7)4, 

S 3 , n (e,j,P) = —Jp + l\sup\J b {x,P) - J n (x,P)\ > (1 - 7)e}. 

7£ V K l xeR J 

Then, for any e > and 7 € (0, 1), we Ziaue that: 

(i) P{P n < L- X (l - a 2 ,P)} > 1 - (a 2 + £ + *!,„(£, 7,^)); 

(ii) P{i2 n >L- 1 (a,P)} > 1 - (ai + £ + <5 2 , n (e,7,P)); 

(iii) P{L~ X (ai , P) < P„ < L- 1 (1 - a 2 , P)} > 1 - (ai + a 2 + e + $ 3>n (e, 7, P)) • 
PROOF. Let e > and 7 £ (0, 1) be given. Note that 

p{sup{L n (x,P) - J n (x,P)} > 4 

< p\sup{L n {x,P) - J b {x,P)} + sup{ J b (x,P) - J n {x,P)} > 4 

< p{sup{L„(x,P) - J b (x,P)} > 7 4 

+ /{sup{ J b (x, P) - J n (x, P)} > (1 - 7 )4 

^ — \hr + l\snv{J b {x,P) - J n (x,P)} > (1 - 7 )e), 
7£ V k„ L xeR J 

where the final inequality follows from Lemma A. 2. Assertion (i) thus follows 
from the definition of <5i „(e, 7, P) and part (vi) of Lemma A.l. Assertions (ii) 
and (iii) are established similarly. □ 

Proof of Theorem 2.1. To prove (i), note that by part (i) of Lem- 
ma A. 3, we have for any e > and 7 6 (0, 1) that 

su P P{i2 n <L- 1 (l-a 2 ,P)}>l- (a 2 + £+ inf 6 1>n (e,j,P)), 

where 

5i, n (e,7,P) = —Jp + l{ S up{J b (x,P) - J n {x,P)} > (1 - 7 )4. 

7£ V k n l xeR ) 

By the assumption on sup Pg p sup xgR { J b (x, P) — J n (x,P)}, we have that 
inf p e p Si jTl (e, 7, P) — > for every £ > 0. Thus, there exists a sequence e n > 
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tending to so that infp e p #i in (e n ,7, P) — > 0. The desired claim now follows 
from applying part (i) of Lemma A. 3 to this sequence. Assertions (ii) and (iii) 
follow in exactly the same way. □ 

A. 2. Proof of Theorem 2.4. We prove only (i). Similar arguments can 
be used to establish (ii) and (iii). Let a± = 0, < < 1 and i] > be given. 
Choose 5 > so that 

SU P {J n (x,P')-J n (x,P)}<^, 
iGR Z 

whenever p(P',P) < 5 for P' £ P' and PgP. For n sufficiently large, we 
have that 

sup P{p(P n ,P)> 5} <\ and sup P{P n £ P'} < 7. 
PGP 4 p gP 4 

For such n, we therefore have that 

1 - \ < inf P{p(P n ,P) <SnP n eP'} 

< inf p{sup{J n (x,P n )-J n (x,P)}<^-). 
It follows from part (vi) of Lemma A.l that for such n 

pgP{Rn < Jn\l ~ a 2 ,Pn)} > 1 " (a 2 + Tj). 

Since the choice of rj was arbitrary, the desired result follows. 

SUPPLEMENTARY MATERIAL 

Supplement to "On the uniform asymptotic validity of subsampling and 
the bootstrap" (DOI: 10.1214/12-AOS1051SUPP; .pdf). The supplement 
provides additional details and proofs for many of the results in the authors' 
paper. 
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